Python offers various libraries that provide different features for creating different types of visualizations. The Python Graph Gallery is a collection of hudreds of plots made with Python. There are the most commonly used libraries such as Matplotlib, Seaborn, and Plotly etc.
Today, I would like give you a simple guide of how to make network diagrams using Python.
# import library
import pandas as pd
import numpy as np
import plotly
import plotly.graph_objs as go
import plotly.express as px
import networkx as nx
import math
import matplotlib.pyplot as plt
import json
Install the libraries in the terminal using:
pip install package_name
or
conda install package_name
In order to use Plotly in JupyterLab, we would need to install a extension. Run the following command:
jupyter labextension install jupyterlab-plotly@4.13.0
If its still not showing anything, please refer to this page and the getting started doc
import plotly.io as pio
pio.renderers.default='notebook'
%%HTML
<script src="require.js"></script>
plotly.offline.init_notebook_mode()
clean_rescured_table = pd.read_csv('clean_rescured_table.csv')
clean_rescured_table.head()
We can create a edge list form the data by using 'groupby' and count the occurance of every unique edge.
from_to = clean_rescured_table.groupby(['from', 'to'])['from'].count()
from_to
# The index contains node pair of each unique edge
from_to.index
# Create an empty dataframe with column names defined
from_to_df = pd.DataFrame(columns = ["from", "to", "count"])
# Append data from the grouped series to the empty dataframe
for i in range(len(from_to.index)):
from_to_df.loc[i] = [from_to.index[i][0], from_to.index[i][1], from_to[i]]
# The final edge list
from_to_df
Python package NetworkX allows you to create, manipulate, and study complex networks.
The most basic network can be draw using from_pandas_edgelist which returns a graph from Pandas dataFrame containing an edge list. The DataFrame should have at least two columns of nodes and zero or more columns of edge attributes. Each row represent one edge instance.
# Build the network
G = nx.from_pandas_edgelist(from_to_df, 'from', 'to')
# Draw the network
nx.draw(G, with_labels=True)
plt.show()
As you can alrealy see, the network we just built looks quite messy. Fortunately, we could use the arguments of the draw() function to custom the style and layout of our network diagram.We are allowed to make changes of:
# Set the graph size
fig, ax = plt.subplots(figsize=(10, 10))
# Draw network with Custom settings
nx.draw(G,
with_labels=True,
node_size=1000, # default = 300
node_color="skyblue", # Can be string or rgb(a) tuple
node_shape="8", # One of the ‘so^>v<dph8’
alpha=0.7, # Transparency
font_size = 10,
font_color = 'black',
font_weight = 'bold',
edge_color = 'orange',
width = 3,
# Uncomment and try the below layout settings, which one would you perfer?
#pos=nx.fruchterman_reingold_layout(G)
pos=nx.circular_layout(G)
#pos=nx.random_layout(G)
#pos=nx.spectral_layout(G)
#pos=nx.spring_layout(G)
)
plt.show()
NetworkX offers G.degree() which measures the total number of edges connected to a particular vertex. This example shows the two common ways to visualize the distribution of the degree of nodes: a degree-rank plot and a degree histogram.
# Set the graph size
fig, ax = plt.subplots(1, 2, figsize=(10,5))
G = nx.from_pandas_edgelist(from_to_df, 'from', 'to')
degree_sequence = sorted((d for n, d in G.degree()), reverse=True)
ax[0].plot(degree_sequence, "b-", marker="o")
ax[1].bar(*np.unique(degree_sequence, return_counts=True))
plt.show()
It is very important to distinguish directed and undirected networks. NetworkX offers a function Digraph() for defining directed networks whereas another function Graph() is used for defining undirected networks.
# Build the directed network
G = nx.from_pandas_edgelist(from_to_df, 'from', 'to', create_using=nx.DiGraph())
# Set the graph size
fig, ax = plt.subplots(figsize=(10, 10))
# Draw network with Custom settings
nx.draw(G,
arrows=True, # Add arrows to the network
with_labels=True,
node_size=1000,
node_color="skyblue",
node_shape="8",
alpha=0.7,
font_size = 10,
font_color = 'black',
font_weight = 'bold',
edge_color = 'orange',
width = 3,
pos=nx.circular_layout(G)
)
plt.show()
Is there a better way of showing weight of an edge with lable? Of course! We could map the weight to the width or color of the edge, whereas a deeper color or thicker line indicates higher correlation between nodes.
edge_cmap allows us to map a colormap to a variable, aviable opinions could be found here: https://matplotlib.org/3.5.0/tutorials/colors/colormaps.html. An example is:

fig, ax = plt.subplots(figsize=(10, 10))
# Build your graph
G=nx.from_pandas_edgelist(from_to_df, 'from', 'to', create_using=nx.DiGraph())
# Draw network with Custom settings
nx.draw(G,
arrows=True,
with_labels=True,
node_size=1000,
node_color="skyblue",
node_shape="8",
alpha=0.7,
font_size = 10,
font_color = 'black',
font_weight = 'bold',
edge_color = from_to_df['count'], # Map variable to edge color
edge_cmap=plt.cm.Greys, # Colormap for mapping intensities of edges
width = 3,
pos=nx.circular_layout(G)
)
plt.show()
Notice, the when mapping the variable to edge width he value of the variable might be too big for the width, we could end up getting a network blocked by really thick edges. Therefore, we would want to scale the value to a certain range. There are many ways of doing so, the simplest way is to use log. Be careful, we need to use the log function from the numpy library rather than the math library (becaue math.log only expect one value and numpy.log can compute log for a sequence of number).
fig, ax = plt.subplots(figsize=(10, 10))
# Build your graph
G=nx.from_pandas_edgelist(from_to_df, 'from', 'to', create_using=nx.DiGraph())
# Draw network with Custom settings
nx.draw(G,
arrows=True,
with_labels=True,
node_size=1000,
node_color="skyblue",
node_shape="8",
alpha=0.7,
font_size = 10,
font_color = 'black',
font_weight = 'bold',
edge_color = 'orange',
width = np.log(from_to_df['count']), # Map varaible to edge width
pos=nx.circular_layout(G)
)
plt.show()
Much better!
It still looks we have too many edges showing. We could always filter the data to show only some specific connections. What if we want to keep all the edges but highlight links with significant weight? Could you think of a way of doing so?
Networks could be related to geographic locations. In our case, the nodes are provinces of China and the links are the number of human trafficking victims. Oftentimes, visualising such a network on a base map would be a good idea to provide direct information geographically.
The very first step: download GeoJSON-formated geometry information of the regions/countries we would like to draw.
You could find the GeoJSON file for most of the countires here
with open('china.json') as file:
china = json.load(file)
The most basic map could be created by using plotly.express.choropleth():
fig = px.choropleth(from_to_df,
geojson=china,
locations = from_to_df['from'].unique(),
color = range(33),
color_continuous_scale="turbo",
basemap_visible = True,
)
fig.update_geos(fitbounds="locations", visible=True) # zoom the map to the locations we defined
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
......and if we want to map a variable to the colormap
# Create a dataframe of number of victims found in each province
to_df = clean_rescured_table.groupby(['to'])['to'].count().to_frame()
to_df.head(5)
fig = px.choropleth(to_df,
geojson=china,
locations = to_df.index,
color = to_df['to'], # variable we want to present on the map
color_continuous_scale="Purples",
basemap_visible = True,
)
fig.update_geos(fitbounds="locations", visible=True) # zoom the map to the locations we defined
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
Lastly, we want to visualize the network on the map! The initial step is to define the longitude and latitude of our nodes (provinces). Since I did not find such a file (I found locations of cities in China, but each province has so many cities) and there are only 33 nodes, I just googled the location of each province and manually create a CSV.
# CSV I manually created
China_provinces = pd.read_csv('China_Provinces.csv')
Then we create a base map, like what we have done before
fig = go.Figure()
# Create a base map
fig = px.choropleth(from_to_df,
geojson=china,
locations = from_to_df['from'].unique() ,
color = range(33),
basemap_visible = False,
color_continuous_scale="turbo"
)
Next, we draw the edges one by one in a loop. We are going to define the start and end nodes of each edge, and then define the lat and lon of each pair of start and end nodes. The core function go.Scattergeo visualize scatter point or lines on geographic map with provided lon/lat pairs
for i in range(len(from_to_df)):
# Define the start and end nodes of a connection
start = from_to_df['from'][i]
end = from_to_df['to'][i]
# Define the lat and lon of each pair of start and end nodes
start_lon = China_provinces.loc[China_provinces['provinces'] == start]['lon'].item()
end_lon = China_provinces.loc[China_provinces['provinces'] == end]['lon'].item()
start_lat = China_provinces.loc[China_provinces['provinces'] == start]['lat'].item()
end_lat = China_provinces.loc[China_provinces['provinces'] == end]['lat'].item()
# Add the link between the nodes
fig.add_trace(
go.Scattergeo(
lon = [start_lon, end_lon],
lat = [start_lat, end_lat],
mode = 'lines',
line = dict(width = math.log(from_to_df['count'][i]) ,color = 'black'),
)
)
fig.update_geos(fitbounds="locations", visible=True)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.update_coloraxes(showscale=False)
fig.show()
and If you want add the nodes to the map as well...
fig.add_trace(go.Scattergeo(
lon = China_provinces['lon'],
lat = China_provinces['lat'],
hoverinfo = 'text',
text = China_provinces['provinces'],
mode = 'markers',
marker = dict(
size = 7,
color = 'white',
line = dict(
width = 3,
color = 'rgba(68, 68, 68, 0)'
)
)))
fig.update_geos(fitbounds="locations", visible=True)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.update_coloraxes(showscale=False)
fig.show()